NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

S‐ GMAS : Genome‐Wide Mediation Analysis With Brain Subcortical Shape Mediators

https://doi.org/10.1002/hbm.70297

Ding, Shengxian; Liu, Rongjie; Srivastava, Anuj; Nowakowski, Richard S; Shen, Li; Thompson, Paul M; Zhang, Heping; Huang, Chao (August 2025, Human Brain Mapping)

ABSTRACT Mediation analysis is widely utilized in neuroscience to investigate the role of brain image phenotypes in the neurological pathways from genetic exposures to clinical outcomes. However, it is still difficult to conduct mediation analyses with whole genome‐wide exposures and brain subcortical shape mediators due to several challenges including (i) large‐scale genetic exposures, that is, millions of single‐nucleotide polymorphisms (SNPs); (ii) nonlinear Hilbert space for shape mediators; and (iii) statistical inference on the direct and indirect effects. To tackle these challenges, this paper proposes a genome‐wide mediation analysis framework with brain subcortical shape mediators. First, to address the issue caused by the high dimensionality in genetic exposures, a fast genome‐wide association analysis is conducted to discover potential genetic variants with significant genetic effects on the clinical outcome. Second, the square‐root velocity function representations are extracted from the brain subcortical shapes, which fall in an unconstrained linear Hilbert subspace. Third, to identify the underlying causal pathways from the detected SNPs to the clinical outcome implicitly through the shape mediators, we utilize a shape mediation analysis framework consisting of a shape‐on‐scalar model and a scalar‐on‐shape model. Furthermore, the bootstrap resampling approach is adopted to investigate both global and spatial significant mediation effects. Finally, our framework is applied to the corpus callosum shape data from the Alzheimer's Disease Neuroimaging Initiative.
more » « less
Free, publicly-accessible full text available August 1, 2026
Finite Sample Valid Inference via Calibrated Bootstrap

Jiang, Yiran; Liu, Chuanhai; Zhang, Heping (August 2024, arXiv.org)

While widely used as a general method for uncertainty quantification, the bootstrap method encounters difficulties that raise concerns about its validity in practical applications. This paper introduces a new resampling-based method, termed calibrated bootstrap, designed to generate finite sample-valid parametric inference from a sample of size n. The central idea is to calibrate an m-out-of-n resampling scheme, where the calibration parameter m is determined against inferential pivotal quantities derived from the cumulative distribution functions of loss functions in parameter estimation. The method comprises two algorithms. The first, named resampling approximation (RA), employs a stochastic approximation algorithm to find the value of the calibration parameter m=mα for a given α in a manner that ensures the resulting m-out-of-n bootstrapped 1−α confidence set is valid. The second algorithm, termed distributional resampling (DR), is developed to further select samples of bootstrapped estimates from the RA step when constructing 1−α confidence sets for a range of α values is of interest. The proposed method is illustrated and compared to existing methods using linear regression with and without L1 penalty, within the context of a high-dimensional setting and a real-world data application. The paper concludes with remarks on a few open problems worthy of consideration.
more » « less
Full Text Available
Heterogeneity Analysis on Multi-State Brain Functional Connectivity and Adolescent Neurocognition

https://doi.org/10.1080/01621459.2024.2311363

Wang, Shiying; Constable, Todd; Zhang, Heping; Zhao, Yize (April 2024, Journal of the American Statistical Association)

Full Text Available
Statistics and AI: A Fireside Conversation

https://doi.org/10.1162/99608f92.c066fe9c

Lin, Xihong; Cai, Tianxi; Donoho, David; Fu, Haoda; Ke, Tracy; Jin, Jiashun; Meng, Xiao-Li; Qu, Annie; Shi, Chengchun; Song, Peter; et al (January 2025, Harvard data science review)

Full Text Available
Nonparametric Two-Sample Tests of High Dimensional Mean Vectors via Random Integration

https://doi.org/10.1080/01621459.2022.2141636

Jiang, Yunlu; Wang, Xueqin; Wen, Canhong; Jiang, Yukang; Zhang, Heping (January 2024, Journal of the American Statistical Association)

Full Text Available
Nonparametric Statistical Inference via Metric Distribution Function in Metric Spaces

https://doi.org/10.1080/01621459.2023.2277417

Wang, Xueqin; Zhu, Jin; Pan, Wenliang; Zhu, Junhao; Zhang, Heping (December 2023, Journal of the American Statistical Association)

The distribution function is essential in statistical inference and connected with samples to form a directed closed loop by the correspondence theorem in measure theory and the Glivenko-Cantelli and Donsker properties. This connection creates a paradigm for statistical inference. However, existing distribution functions are defined in Euclidean spaces and are no longer convenient to use in rapidly evolving data objects of complex nature. It is imperative to develop the concept of the distribution function in a more general space to meet emerging needs. Note that the linearity allows us to use hypercubes to define the distribution function in a Euclidean space. Still, without the linearity in a metric space, we must work with the metric to investigate the probability measure. We introduce a class of metric distribution functions through the metric only. We overcome this challenging step by proving the correspondence theorem and the Glivenko-Cantelli theorem for metric distribution functions in metric spaces, laying the foundation for conducting rational statistical inference for metric space-valued data. Then, we develop a homogeneity test and a mutual independence test for non-Euclidean random objects and present comprehensive empirical evidence to support the performance of our proposed methods. Supplementary materials for this article are available online.
more » « less
Full Text Available
Optimal and Safe Estimation for High-Dimensional Semi-Supervised Learning

https://doi.org/10.1080/01621459.2023.2277409

Deng, Siyi; Ning, Yang; Zhao, Jiwei; Zhang, Heping (November 2023, Journal of the American Statistical Association)

Full Text Available
Five Critical Gene-Based Biomarkers With Optimal Performance for Hepatocellular Carcinoma

https://doi.org/10.1177/11769351231190477

Liu, Yongjun; Zhang, Heping; Xu, Yuqing; Liu, Yao-Zhong; Al-Adra, David P; Yeh, Matthew M; Zhang, Zhengjun (January 2023, Cancer Informatics)

Hepatocellular carcinoma (HCC) is one of the most fatal cancers in the world. There is an urgent need to understand the molecular background of HCC to facilitate the identification of biomarkers and discover effective therapeutic targets. Published transcriptomic studies have reported a large number of genes that are individually significant for HCC. However, reliable biomarkers remain to be determined. In this study, built on max-linear competing risk factor models, we developed a machine learning analytical framework to analyze transcriptomic data to identify the most miniature set of differentially expressed genes (DEGs). By analyzing 9 public whole-transcriptome datasets (containing 1184 HCC samples and 672 nontumor controls), we identified 5 critical differentially expressed genes (DEGs) (ie, CCDC107, CXCL12, GIGYF1, GMNN, and IFFO1) between HCC and control samples. The classifiers built on these 5 DEGs reached nearly perfect performance in identification of HCC. The performance of the 5 DEGs was further validated in a US Caucasian cohort that we collected (containing 17 HCC with paired nontumor tissue). The conceptual advance of our work lies in modeling gene-gene interactions and correcting batch effect in the analytic framework. The classifiers built on the 5 DEGs demonstrated clear signature patterns for HCC. The results are interpretable, robust, and reproducible across diverse cohorts/populations with various disease etiologies, indicating the 5 DEGs are intrinsic variables that can describe the overall features of HCC at the genomic level. The analytical framework applied in this study may pave a new way for improving transcriptome profiling analysis of human cancers.
more » « less
Full Text Available
Depth importance in precision medicine (DIPM): a tree- and forest-based method for right-censored survival outcomes

https://doi.org/10.1093/biostatistics/kxaa021

Chen, Victoria; Zhang, Heping (May 2020, Biostatistics)

Summary Many clinical trials have been conducted to compare right-censored survival outcomes between interventions. Such comparisons are typically made on the basis of the entire group receiving one intervention versus the others. In order to identify subgroups for which the preferential treatment may differ from the overall group, we propose the depth importance in precision medicine (DIPM) method for such data within the precision medicine framework. The approach first modifies the split criteria of the traditional classification tree to fit the precision medicine setting. Then, a random forest of trees is constructed at each node. The forest is used to calculate depth variable importance scores for each candidate split variable. The variable with the highest score is identified as the best variable to split the node. The importance score is a flexible and simply constructed measure that makes use of the observation that more important variables tend to be selected closer to the root nodes of trees. The DIPM method is primarily designed for the analysis of clinical data with two treatment groups. We also present the extension to the case of more than two treatment groups. We use simulation studies to demonstrate the accuracy of our method and provide the results of applications to two real-world data sets. In the case of one data set, the DIPM method outperforms an existing method, and a primary motivation of this article is the ability of the DIPM method to address the shortcomings of this existing method. Altogether, the DIPM method yields promising results that demonstrate its capacity to guide personalized treatment decisions in cases with right-censored survival outcomes.
more » « less
Full Text Available
A Super Scalable Algorithm for Short Segment Detection

https://doi.org/10.1007/s12561-020-09278-z

Hao, Ning; Niu, Yue Selena; Xiao, Feifei; Zhang, Heping (April 2020, Statistics in Biosciences)
null (Ed.)
Full Text Available

« Prev Next »

Search for: All records